Statistical Disclosure Control for Tabular Data in R
نویسندگان
چکیده
To perform statistical disclosure control (SDC) on tabular data is a challenging task because we need to ensure that every suppressed cell of a table has a suffi cient width of a confi dentiality interval under the presence of linear relations among cell variables. However, we fi nd that the existing SDC tool (i.e., τ-ARGUS) does not effectively support an output checking process of the on-site use program in Japan. We therefore develop a new SDC tool in R, which produces safe tabular data with auxiliary information that is necessary for an output checker to verify its safety. In this paper, we describe the major features of our SDC tool and discuss possible extensions in the future. Our SDC tool performs primary suppressions on a frequency table and a magnitude table with the minimum frequency rule and an occupancy rule (e.g., (n,k)-rule), respectively. We implement the optimal secondary suppression mechanism based on the technique of Benders decomposition.
منابع مشابه
Statistical disclosure control in tabular data
Data disseminated by National Statistical Agencies (NSAs) can be classified as either microdata or tabular data. Tabular data is obtained from microdata by crossing one or more categorical variables. Although cell tables provide aggregated information, they also need to be protected. This chapter is a short introduction to tabular data protection. It contains three main sections. The first one ...
متن کاملMaximum Utility-Minimum Information Loss Table Server Design for Statistical Disclosure Control of Tabular Data
Statistical agencies typically serve a diverse group of end users with varying information needs. Accommodating the conflicting needs for information in combination with stringent rules for statistical disclosure limitation (SDL) of statistical information creates a special challenge. We provide a generic table server design for SDL of tabular data to meet this challenge. Our table server desig...
متن کاملInformation-Theoretic Disclosure Risk Measures in Statistical Disclosure Control of Tabular Data
Statistical database protection is a part of information security which tries to prevent published statistical information (tables, individual records) from disclosing the contribution of specific respondents. This paper shows how to use information-theoretic concepts to measure disclosure risk for tabular data. The proposed disclosure risk measure is compatible with a broad class of disclosure...
متن کاملTabular Statistical Disclosure Control: Optimization Techniques in Suppression and Controlled Tabular Adjustment1
The problem of disseminating tabular data such that the amount of information provided satisfies the public need while protecting individually identifiable data is a problem in all governmental statistical agencies. The problem falls into the category of Statistical Disclosure Control and provides many difficult policy and technical challenges for these agencies. In order to achieve the double ...
متن کاملA posteriori Disclosure Risk Measure for Tabular Data Based on Conditional Entropy∗
Statistical database protection, also known as Statistical Disclosure Control (SDC), is a part of information security which tries to prevent published statistical information (tables, individual records) from disclosing the contribution of specific respondents. This paper deals with the assessment of the disclosure risk associated to the release of tabular data. So-called sensitivity rules are...
متن کامل